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This paper investigates the cross-correlations across multiple cli- 
mate model errors. We build a Bayesian hierarchical model that ac- 
counts for the spatial dependence of individual models as well as 
cross-covariances across different climate models. Our method allows 
for a nonseparable and nonstationary cross-covariance structure. We 
^ , also present a covariance approximation approach to facilitate the 

■^^ ' computation in the modeling and analysis of very large multivariate 

spatial data sets. The covariance approximation consists of two parts: 
^ . a reduced-rank part to capture the large-scale spatial dependence, 

C/3 ' and a sparse covariance matrix to correct the small-scale dependence 

error induced by the reduced rank approximation. We pay special 

attention to the case that the second part of the approximation has 

►^ ' a block-diagonal structure. Simulation results of model fitting and 

rT) • prediction show substantial improvement of the proposed approxima- 

fT^ ' tion over the predictive process approximation and the independent 

blocks analysis. We then apply our computational approach to the 
joint statistical modeling of multiple climate model errors. 



1. Introduction. This paper addresses the problem of combining mul- 
tiple climate model outputs while accounting for dependence across differ- 
ent models as well as spatial dependence within each individual model. To 
study the impact of human activity on climate change, the Intergovernmen- 
tal Panel on Climate Change (IPCC) is coordinating efforts worldwide to 
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C^ . develop coupled atmosphere-ocean general circulation models (AOGCMs 
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Various organizations around the world are developing state-of-the-art nu- 
merical models and currently 20+ climate models are available. A growing 
body of literature also exists that studies multiple climate model outputs 
[e.g., Tebaldi et al. (2005); Furrer et al. (2007); Jun, Knutti and Nychka 
(2008a, 2008b); Smith et al. (2009); Sain and Furrer (2010); Christensen 
and Sain (2010); Sain, Furrer and Cressie (2011)]. One important research 
problem of interest regarding these climate model outputs is to investi- 
gate the cross-correlations across climate model errors [Tebaldi and Knutti 
(2007); Jun, Knutti and Nychka (2008a, 2008b); Knutti et al. (2010b)]. Cli- 
mate models are constantly copied and compared, and successful approxima- 
tion schemes are frequently borrowed from other climate models. Therefore, 
many of the climate models are dependent to some degree and thus may 
be expected to have correlated errors. For similar reasons, we may expect 
correlations to be higher between the models developed by the same orga- 
nization. Indeed, Jun, Knutti and Nychka (2008a, 2008b) quantified cross- 
correlations between pairs of climate model errors at each spatial location 
and the results show that many climate model errors have high correlations 
and some of the models developed by the same organizations have even 
higher correlated errors. Note that throughout the paper, we use the termi- 
nology "error" rather than "bias" to describe the discrepancy between the 
climate model output and the true climate. The reason is that we consider 
the "error" as a stochastic process rather than a deterministic quantity. 

In this paper we build a joint statistical model for multiple climate model 
errors that accounts for the spatial dependence of individual models as well 
as cross-covariance across different climate models. Our model offers a non- 
separable cross-covariance structure. We work with a climate variable, sur- 
face temperature, from multiple global AOGCM outputs. We include sev- 
eral covariates such as latitude, land/ocean effect, and altitude in the mean 
structure. The marginal and cross-covariance structure of climate model er- 
rors are modeled using a spatially varying linear model of co-regionalization 
(LMC). The resulting covariance structure is nonstationary and is able to 
characterize the spatially varying cross-correlations between multiple cli- 
mate model errors. Our modeling approach complements in many ways the 
previous work of Jun, Knutti and Nychka (2008b). Jun, Knutti and Nychka 
(2008b) used kernel smoothing of the products of climate model errors to 
obtain cross-correlations and did not formally build joint statistical mod- 
els for multiple climate model outputs. One direct product of our model 
is a continuous surface map for the cross-covariances. Our Bayesian hier- 
archical modeling approach also provides uncertainty measures of the esti- 
mations of the spatially varying cross-covariances, while it is a challenging 
task to achieve for the kernel smoothing approach. In Jun, Knutti and Ny- 
chka (2008b) they fit each climate model error separately to a univariate 
regression model before obtaining the kernel estimate of cross-correlations. 
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They only considered land/ocean effect in tlie covariance structure. In our 
approacli, we not only include the land/ocean effect but also altitude and lat- 
itude in the cross-covariance structure of the climate model errors. As a sta- 
tistical methodology, joint modeling is more efficient in estimating model 
parameters. Moreover, our approach is able to produce spatial prediction 
or interpolation and thus is potentially useful as a statistical downscaling 
technique for multivariate climate model outputs. 

The naive implementation of our modeling approach faces a big com- 
putational challenge because it requires repeated calculations of quadratic 
forms in large inverse covariance matrices and determinants of large covari- 
ance matrices, needed in the posterior distribution and likelihood evaluation. 
This is a well-known challenge when dealing with large spatial data sets. 
Various methods have been proposed in the literature to overcome this chal- 
lenge, including likelihood approximation [Vecchia (1988); Stein, Chi and 
Welty (2004); Puentes (2007); Caragea and Smith (2007)], covariance taper- 
ing [Furrer, Genton and Nychka (2006); Kaufman, Schervish and Nychka 
(2008)], Gaussian Markov random-field approximation [Rue and Tjelme- 
land (2002); Rue and Held (2005)] and reduced rank approximation [Higdon 
(2002); Wikle and Cressie (1999); Ver Hoef, Cressie and Barry (2004); Kam- 
mann and Wand (2003); Cressie and Johannesson (2008); Banerjee et al. 
(2008)]. Most of these methods focus on univariate processes. One of the ex- 
ceptions is the predictive process approach [Banerjee et al. (2008)]. Although 
the predictive process approach is applicable to our multivariate model, it 
is not a perfect choice because it usually fails to accurately capture local, 
small-scale dependence structures [Finley et al. (2009)]. 

We develop a new covariance approximation for multivariate processes 
that improves the predictive process approach. Our covariance approxima- 
tion, called the full-scale approximation, consists of two parts: The first 
part is the same as the multivariate predictive process, which is effective 
in capturing large-scale spatial dependence; and the second part is a sparse 
covariance matrix that can approximate well the small-scale spatial depen- 
dence, that is, unexplained by the first part. The complementary form of our 
covariance approximation enables the use of reduced-rank and sparse matrix 
operations and hence greatly facilitates computation in the application of 
the Gaussian process models for large data sets. Although our method is de- 
veloped to study multiple climate model errors, it is generally applicable to 
a wide range of multivariate Gaussian process models involving large spatial 
data sets. The full-scale approximation has been previously studied for uni- 
variate processes in Sang and Huang (2010), where covariance tapering was 
used to generate the second part of the full-scale approximation. In addition 
to the covariance tapering, this paper considers using block diagonal covari- 
ance for the same purpose. Simulation results of model fitting and prediction 
using the full-scale approximation show substantial improvement over the 
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predictive process and the independent blocks analysis. We also find that 
using the block diagonal covariance achieves even faster computation than 
using covariance tapering with comparable performance. 

Our work contributes to the literature of statistical modeling of climate 
model outputs. Despite high correlations between the errors of some climate 
models, it has been commonly assumed that climate model outputs and/or 
their errors are independent of each other [Giorgi and Mearns (2002); Tebaldi 
et al. (2005); Green, Goddard and Lall (2006); Furrer et al. (2007); Smith 
et al. (2009); Furrer and Sain (2009); Tebaldi and Sanso (2009)]. Only re- 
cently, several authors have attempted to build joint models for multiple cli- 
mate model outputs that account for the dependence across different climate 
models. For example, Sain and Furrer (2010) used a multivariate Markov 
random-field (MRF) approach to analyze 20-year average precipitation data 
from five different regional climate model (RCM) outputs. They presented 
a method to combine the five RCM outputs as a weighted average and the 
weights are determined by the cross-covariance structure of different RCM 
outputs. Sain, Furrer and Cressie (2011) built multivariate MRF models for 
the analysis of two climate variables, precipitation and temperature, from 
six ensemble members of an RCM (three ensemble members for each climate 
variable). Their model is designed for what they call simple ensembles that 
resulted from the perturbed initial conditions of one RCM, not for multiple 
RCMs. Unlike our approach, the MRF models are best suited for lattice 
data, and it might be difficult for their approach to incorporate covariates 
information into the cross-covariance structure. Christensen and Sain (2010) 
used the factor model approach to integrate multiple RCM outputs. Their 
analysis is done at each spatial location independently and, thus, the spatial 
dependence across grid pixels is not considered in the analysis. We believe 
our modeling and computational method provides a fiexible alternative to 
these existing approaches in joint modeling of climate model outputs. 

The remainder of the paper is organized as follows. In Section 2 we detail 
the motivating data set of climate model errors and present a multivariate 
spatial process model to analyze the data. Section 3 introduces a covariance 
approximation method to address the model fitting issues with large mul- 
tivariate spatial data sets. In Section 4 we present a simulation study to 
explore properties of the proposed computational method. The full analysis 
of the climate errors data is then offered in Section 5. Finally, we conclude 
the paper in Section 6. 

2. Data and the multivariate spatial model. 

2.1. Data. Our goal is to build a joint statistical model for multiple cli- 
mate model errors. By climate model error, we mean the difference between 
the climate model output and the corresponding observations. We use sur- 
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Table 1 

The names of modeling groups, country, IPCC I.D. and resolutions of the IPCC model 

outputs used m the study. The resolution of the observations is 5° x 5° 



Group 



Country 



IPCC I.D. 



Resolution 
(longitude X latitude) 



1 US Dept. of Commerce/NOAA/ 

Geophysical Fluid 
Dynamics Laboratory 

2 US Dept. of Commerce/NOAA/ 

Geophysical Fluid 
Dynamics Laboratory 

3 Hadley Centre for Climate 

Prediction and Research/ 
Met Office 

4 Hadley Centre for Climate 
Prediction and Research/ 

Met Office 

5 LASG/fnstitute of 
Atmospheric Physics 



USA GFDL-CM2.0 



USA GFDL-CM2.1 



UK UKMO-HadCM3 



UK UKMO-HadGEMl 



China FGOALS-gl.O 



2.5° X 2° 



O ., r\0 



2.5° X 2 



o ^^ n An° 



3.79° X 2.47 



1.875° X 1.24 



2.81° X 3° 



face temperature (unit: K) with global coverage obtained from the AOGCM 
simulations as well as the observations. The observations are provided by 
the Climate Research Unit (CRU), East Anglia, and the Hadley Centre, UK 
MetOfiice [Jones et al. (1999); Rayner et al. (2006)]. The data (both climate 
model outputs and observations) are monthly averages given on a regular 
spatial grid (the resolution is 5° x 5°). A list of the climate models used 
in this study is given in Table 1. These climate models are developed by 
different modeling groups worldwide, under the coordination of the IPCC, 
and they use common initial conditions. Each model has their own grid res- 
olution. We consider the 30 year interval 1970-1999 and due to the lack of 
observations near the polar area, we only consider temperature data taken 
between latitude 45° S and 72° N, with the full longitude ranging from 
180° W to 180° E. We still have missing observations at a few pixels and 
we impute by taking the average of the spatially neighboring cells (all eight 
neighboring cells if all are available). We study climatological mean state in 
the sense that we first get a seasonal average temperature (e.g., an average of 
the monthly temperature from December to February) and then average it 
over 30 years. We denote this 30 year average of Boreal winter temperature 
by DJF. 

As mentioned earlier, we calculate model errors by taking the difference 
between climate model outputs and actual observations. However, as shown 
in Table 1, the climate model outputs and the observations are at different 
spatial grid resolutions. Since the observations have the coarsest grid, we use 
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the bilinear interpolation of the model output to the observational grid. That 
is, at each grid pixel (in the observation grid), we use the weighted average of 
model outputs at the four nearest pixels in the model output grid. Figure 2 
displays pictures of the climate model errors for the five climate models listed 
in Table 1. First note that the errors of model 5 are significantly larger in 
the higher latitudes and in the Himalayan area compared to the errors of the 
other models. The climate model error maps exhibit quite similar patterns 
among the models developed by the same group, although the errors for 
model 1 in the mid-latitude area of the Northern Hemisphere appear to be 
larger in magnitude compared to those of model 2. All the models have 
larger errors in the Northern Hemisphere and the errors are large over the 
high altitude area and high latitude area. 

We focus on climate model errors in this paper. Since the climate model 
error is defined as the difference between a climate model's output and 
the corresponding observations (we use the observation minus the model 
output), we are effectively building a joint model for multiple climate model 
outputs with the observations as their means (although we have to be careful 
about the sign of the fixed mean part). It would be tantalizing to build an 
elaborate joint statistical model of multiple climate model outputs as well 
as the observations directly at their original spatial grid resolutions. This 
direct modeling approach would require a statistical representation of the 
true climate and of the climate model outputs, both of which are challenging 
tasks. It would be hard to model the true climate in a reasonably complex 
way and the characteristics of observation errors and model errors may not 
be simple. Therefore, we do not pursue this direction in this paper. 

2.2. Multivariate spatial regression models. Let Y(s) = {Yi{s),..., 
Yji[s)) be an i? X 1 response vector along with a p x R matrix of regres- 
sors X(s) = (Xi(s), . . . ,X/j(s)) observed at location s € -D, where D is the 
region of interest. For our application, li(s) represents the error of climate 
model i at location s for i = 1,...,5, Xj(s) the corresponding covariates, 
and D represents the surface of the globe. We consider the multivariate 
spatial regression model 

(1) Y(s) = X^(s)/3 + w(s) + e(s), seDCR'^, 

where /3 = (/3i, . . . ,/3p) is a p x 1 column vector of regression coefficients, 
w(s) is a multivariate spatial process whose detailed specification is given 
below, and the process £:(s) = (ei(s), . . . ,£^(3))-^ models the measurement 
error for the responses. The measurement error process is typically assumed 
to be spatially independent, and at each location, e(s) -^ MVN(0, Se), where 
MVN stands for the multivariate normal distribution, and S^ is an Rx R 
covariance matrix. 
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As a crucial part of model (1), the spatial process w(s) = {wi{s),. . . , 
'Wr{s))'^ captures dependence both within measurements at a given site and 
across the sites. We model ■w(s) as an i?-dimensional zero-mean multivari- 
ate Gaussian process: w(s) ~ MVN(0,rw(-, •))) where the cross-covariance 
matrix function of w(s) is defined as rw(s, s') = [Co'v{wr{s),Wr'{s'))]^^,^-^. 
For any integer n and any collection of sites 5 = (si, . . . , s„), we denote the 
multivariate realizations of w(s) at S as an nR x 1 vector w = (■w-^(si), . . . , 
w (s„)) , which follows an nR x 1 multivariate normal distribution w ~ 
MVN(0,Sw), where Sw = [rw(si,Sj)]" ■ -^ is an nR x nR matrix that can 
be partitioned as an n x n block matrix with the (i,j)th block being the 
Rx R cross-covariance matrix T-^{si,Sj). In the multivariate setting, we 
require a valid cross-covariance function such that the resultant nR x nR 
covariance matrix, 'S-^, is positive definite. 

There have been many works on the construction of flexible cross-covarian- 
ce functions [Mardia and Goodall (1993); Higdon, Swall and Kern (1999); 
Gaspari and Cohn (1999); Higdon (2002); Ver Hoef, Cressie and Barry 
(2004); Gelfand et al. (2004); Majumdar and Gelfand (2007); Gneiting, 
Kleiber and Schlather (2010); Jun (2009); Apanasovich and Genton (2010)]. 
The model in Mardia and Goodall (1993) assumes separable cross-covarian- 
ce functions. Higdon (2002) employs discrete approximation to the kernel 
convolution based on a set of prespecified square-integrable kernel functions. 
The model in Gneiting, Kleiber and Schlather (2010) is isotropic and the 
covariance model requires the same spatial range parameters when there are 
more than two processes. The model of Apanasovich and Genton (2010) is 
developed to model stationary processes and its extension to nonstationary 
processes is not obvious. 

We adopt the LMC approach [Wackernagel (2003); Gelfand et al. (2004)] 
which has recently gained popularity in multivariate spatial modeling due 
to its richness in structure and feasibility in computation. Suppose that 
U(s) = [[/g(s)]g=i:Q is a Q X 1 process with each Uq{s) independently mod- 
eled as a univariate spatial process with mean zero, unit variance and corre- 
lation function pq{-,-). The cross-covariance function of U(s) is a diagonal 
matrix that can be written as ru(s,s') = 0jLi /'(?(s,s'), where is the 
direct sum matrix operator [e.g., Harville (2008)]. The LMC approach as- 
sumes that w(s) = A(s)U(s), where A(s) is an i? x Q transformation matrix, 
that is, nonsingular for all s. For identifiability purposes and without loss 
of generality, A(s) can be taken to be a lower-triangular matrix. It follows 
that the constructed cross-covariance function for w(s) under this model 
is rw(s,s') = A(s)ru(s,s')A^(s'). An alternative expression for the cross- 
covariance function is rw(s,s') = X]z=iaq(s)aJ(s')pg(s,s'), where ag(s) is 
the qth column vector of A(s). The cross-covariance matrix for the realiza- 
tions w at n locations S can be written as a block partitioned matrix S^ 
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with n X n blocks, whose (i,j)th block is A(sj)ru(sj,Sj)A (sj^ 
express Xlw as 



We can 



(2) 



0A(s.) 



ep.( 



Sj , Sj I 



Lg=l 



-I «J= 



eA^ 



: ^Su»4^ 



where „4 is a block-diagonal matrix with n x n blocks whose ith diagonal 
block is A(sj), ©Z=i pq(sj, Sj) is a Q x Q diagonal matrix with Pg(sj, Sj)'s as 
its diagonals, and Su is an (n x n)-block partitioned matrix with ru(sj,Sj) 
as its (i,j)th block. 

For our climate model application, we utilize the multivariate model (1). 
Here, we have R = 5 since five climate model errors are considered. In the 
LMC model, the number of latent processes, Q, can take a value from 1 to R. 
When Q < R, the multivariate process is represented in a lower dimensional 
space and dimensionality reduction is achieved. In this paper, our goal is 
to obtain a rich, constructive class of multivariate spatial process models 
and, therefore, we assume a full rank LMC model with Q = R = 5. We 
do not include the spatial nugget effect e in the model. For the regression 
mean, X (s)/3, we use p = 8 covariates: Legendre polynomials [Abramowitz 
and Stegun (1964)] in latitude of order to 4 with sine of latitude as their 
arguments, an indicator of land/ocean (we give 1 if the domain is over the 
land and otherwise), longitude (unit: degree), and the altitude (altitude 
is set to be zero over the ocean). We scale the altitude variable (unit: m) 
by dividing it by 1000 to make its range comparable to other covariates. 
Our covariates specification is similar to Sain, Furrer and Cressie (2011) 
except that we use multiple terms for latitude effects in order to have enough 
flexibility to capture the dependence of the process on the entire globe. 

For each location s, we model A(s) as a 5 x 5 lower triangular ma- 
trix. Two model specifications for A(s) are considered. In one specification, 
we assume the linear transformation to be independent of space, that is, 
A(s) = A. Thus, we have A(s) = [aij]ij=i:5 with Uij being nonzero constants 
for 1 < i < i < 5 and Ojj = for i > j. To avoid an identifiability problem, we 
let an > for i = 1, . . . , 5. If the Uq{s) are stationary, this specification results 
in a stationary cross-covariance structure. In particular, if the Uq{s) are iden- 
tically distributed, this specification results in a separable cross-covariance 
structure similarly to Sain and Furrer (2010). In the second specification, 
we assume that A(s) vary over space. Specifically, the (i, j)th entry of A(s), 
aij{s), is modeled as ajj(s) = X^(s)?7jj for 1 < i < j < 5, where Xyi(s) is 
a 5 X 1 covariate vector at location s and consists of Legendre polynomials 
in latitude of order to 2, an indicator of land/ocean, and the scaled alti- 
tude. The dimension of ry^ • is 5 x 1. This specification induces nonstationary 
and nonseparable cross-covariance structure. Note that, similar to the first 
specification, to avoid an identifiability problem, we let ajj(s) = |X^(s)t7jJ 
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for i = 1, . . . , 5. Gelfand et al. (2004) proposed to model each element of A(s) 
as a spatial process, but in practice such an approach is usually computa- 
tionally too demanding to be used for large scale problems. Moreover, there 
might be identifiability problems if we do not constrain some of the elements 
in A(s). Thus, we do not consider this option in our paper. For the correla- 
tion function of each latent process Uq (s) , we consider a Matern correlation 
function on a sphere, where the chordal distance is used. For any two lo- 
cations on the globe {Li,li),i = 1,2, where Li and li denote latitude and 
longitude, respectively, the chordal distance is defined as 



ch((Li,/i),(L2,/2)) 



Lt — Lo\ , , . o / /l — /^ ^ ' 



= 2/{earth < Sm ( I + COS Li COS L2 SUl ( 

Here -Rearth is the radius of the Earth. Jun, Knutti and Nychka (2008b) 
showed that the maximum likelihood estimates of the smoothness parame- 
ters for all of the climate models in Table 1 (except for model 4) are close 
to 0.5. Therefore, we fix the smoothness parameter values for all the pro- 
cesses in U to be 0.5 and this is the same as using an exponential correlation 
function for each Uq{s). 

3. Model implementation and covariance approximation. 

3.1. Model fitting. We adopt a Bayesian approach that specifies prior dis- 
tributions on the parameters. Posterior inference for the model parameters 
is implemented by model fitting with Gibbs samplers [Gelfand and Smith 
(1990)] and Metropolis-Hastings updating [Gelman et al. (2004)]. We set (3 
to have a p-dimensional multivariate normal distribution prior with large 
variances. The prior assignment for the covariance parameters for each Uj 
depends upon the specific choice of the correlation functions. In general, 
the spatial range parameters are weakly identifiable, and, hence, reasonably 
informative priors are needed for satisfactory MCMC behavior. We set prior 
distributions for the spatial range parameters relative to the size of their 
domains, for instance, by setting the prior means to refiect one's prior belief 
about the practical spatial range of the data. For the LMC setting with 
a constant A, we may assign truncated normal priors with positive value 
support or inverse gamma priors with infinite variances for the diagonal en- 
tries, and normal priors for other entries. For the spatially varying LMC 
setting, the priors for the coefficients rf^j are normal distributions with large 
variances. 

Given n locations in the set S = {si,...,s„}, the realization of the re- 
sponse vector at these locations can be collectively denoted as Y = (Y(si)-^, 
. . . , Y(s„)-^)-^, and the corresponding matrix of covariates is X = (X(si), . . . , 
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X(s„,)) . The data likelihood can be obtained easily from the fact that 
Y ~ MVN(X/3, Sw + 1„ «) Sle), where Sw is given by (2). Generically denot- 
ing the set of all model parameters by O, the MCMC method is used to draw 
samples of the model parameters from the posterior: p(ri|Y) oc P(ri)P(Y|ri). 
Assuming the prior distribution of (5 is MVN(/Xfl,S^), the posterior sam- 
ples of /3 are updated from its full conditional MVN(/x^|., S^|.), where S^|. = 
[I]-^ + X^(5]w + InOSe)-iX]-i,and/x^|. = I]^|.X^(Sw + In^Se)-iY. All 
the remaining parameters have to be updated using Metropolis-Hastings 
steps. 

Spatial interpolation in the multivariate case allows one to better esti- 
mate one variable at an unobserved location by borrowing information from 
co-located variables. It is also called "cokriging" in geostatistics. The multi- 
variate spatial regression model provides a natural way to do cokriging. For 
example, under the Bayesian inference framework, the predictive distribu- 
tion for Y(so) = [^(so)]j=i:p at a new location sq is a Gaussian distribution 
with 

E[Y(so)|0, Y] = X^(so)/3 + h(so)(Sw + In ® 5]^)"^^ " X/3) 

and 

Cov[Y(so)jf), Y] = rw(so,so) + T.,- h(so)(5]w + In ® S^)-^h^(so), 

where h(so) = [rw(so,Si)]i=i:n is the R x nR cross-covariance matrix be- 
tween w(so) and {w(sj),i = 1, . . . ,n}. 

The computationally demanding part in the model fitting is to calculate 
the quadratic form of the inverse of the nR x nR matrix S^v + In '8' Sg , 
whose computational complexity is of the order 0{{nR)^). This computa- 
tional issue is often referred to as "the big n problem." In our climate model 
application, we have 1,656 locations for each of the five climate models, 
that is, n = 1,656 and R = 5. Although the inversion of the matrix can be 
facilitated by the Cholesky decomposition and linear solvers, computation 
remains expensive when nR is big, especially for the spatially varying LMC 
models which involve a relatively large number of parameters and hence re- 
quire multiple likelihood evaluations at each MCMC iteration. We introduce 
covariance approximation below as a way to gain computational speedup for 
the implementation of the LMC models. 

3.2. Predictive process approximation. In this subsection we review the 
multivariate predictive process approach by Banerjee et al. (2008) and point 
out its drawbacks to motivate our new covariance approximation method. 
The predictive process models consider a fixed set of "knots" S* = {s*, . . . , 
s^} that are chosen from the study region. The Gaussian process ■w(s) in 
model (1) yields an mR random vector w* = [■w'(s|)]^^ over S*. The Best 
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Linear Unbiased Predictor (BLUP) of ■w(s) at any fixed site s based on w* 
is given by w(s) = i?{w(s)|'w*}. Being a conditional expectation, it imme- 
diately follows that w(s) is an optimal predictor of ■w(s) in the sense that it 
minimizes the mean squared prediction error £^{||w(s) — f(w*)|p} over all 
square-integrable (vector- valued) functions f(w*) for Gaussian processes, 
and over all linear functions without the Gaussian assumption. Banerjee 
et al. (2008) refer to w(s) as the predictive process derived from the parent 
process w(s). 

Since the parent process w(s) is an i?-dimensional zero-mean multivariate 
Gaussian process with the cross-covariance function rw(s,s') = Cov(w(s), 
w(s')), the multivariate predictive process has a closed- form expression 

(3) w(s) = Cov(w(s),w*)Var~i(w*)w*=Cw(s,5*;6>)C;r^(6')w*, 

where Cw(s,5*;0) = [rw(s,s*;0), . . . ,rw(s,s;^;0)] is an i? x mR cross- 
covariance matrix between w(s) and {w(s*), s* E 5*}, and C^{0) = [rw(s*, s*: 
9)]Y^-^ is the mR x mR cross-covariance matrix of w* = [w(s*)]™^-^ [see, e.g., 
Banerjee, Carlin and Gelfand (2004)]. 

The multivariate predictive process w(s) is still a zero mean Gaussian pro- 
cess, but now with a fixed rank cross-covariance function given by rw(s, s') = 
Cw(s, 5*; 0)Q-'H^)Cw(s', 5*; 0). Let w = [u}(si)]^^i be the realization of w(s) 
at the set S of the observed locations. It follows that w ~ MVN(0,Sw)) 
where T,^ = C^{S,S*;d)C:;,-\0)Cl{S,S*;0), and C^{S,S*;e) is an nR x 
m,R matrix that can be partitioned as an n x 1 block matrix whose ith block 
is given by C^{si,S*;6). 

Replacing w(s) in (1) with the fixed rank approximation w(s), we obtain 
the following multivariate regression model: 

(4) Y(s)=X^(s)/3 + W(s)+£(s), 

which is called the predictive process approximation of model (1). Based on 
this approximation, the data likelihood can be obtained using Y ~ MVN(X/3, 
Cw(5, S*;e)C;,~^{e)Cl{S, 5*; 0)) + I„ ^ T,e). When the number of knots m is 
chosen to be substantially smaller than n, computational gains are achieved 
since the likelihood evaluation that initially involves inversion of nR x nR 
matrices can be done by inverting much smaller mR x mR matrices. 

The multivariate predictive process is an attractive reduced rank approach 
to deal with large spatial data sets. It encompasses a very flexible class of 
spatial cross-covariance models since any given multivariate spatial Gaussian 
process with a valid cross-covariance function would induce a multivariate 
predictive process. Since the predictive process is still a valid Gaussian pro- 
cess, the inference and prediction schemes for multivariate Gaussian spatial 
process models that we described in Section 3.1 can be easily implemented 
here. 
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However, the predictive process models share one common problem with 
many other reduced rank approaches: They generally fail to capture lo- 
cal/small scale dependence accurately [Stein (2008); Finley et al. (2009); 
Banerjee et al. (2010)] and thus lead to biased parameter estimations and 
errors in prediction. 

To see the problems with the multivariate spatial process w{s) in (1), we 
consider the following decomposition of the parent process: 

(5) w(s) = w(s) + (vif(s) — w(s)). 

We call w(s) — w(s) the residual process. The decomposition in (5) immedi- 
ately implies a decomposition of the covariance function of the process w(s): 

(6) rw(s, s') = rw(s, s') + (rw(s, s') - rw(s, s')), 

where rw(s,s') — rw(s,s') is the cross-covariance function of the residual 
process w(s) — w(s). Note that for any arbitrary set of n locations S, 
Sw — Sw = [T^{si,Sj)]'^.j^i — [r^{si,Sj)]'^j^i is the conditional variance- 
covariance matrix of w given w*, and hence a nonnegative definite matrix. 
Using the multivariate predictive process to approximate w(s), we discard 
the residual process 5]w — ^\v entirely and thus fail to capture the depen- 
dence it carries. This is indeed the fundamental issue that leads to biased 
estimation in the model parameters. 

To understand and illustrate the issue with the predictive process due 
to ignoring the residual process, we consider a univariate stationary Gaus- 
sian process and we remark that the multivariate predictive process shares 
the same problems. Assume the covariance function of the parent process is 
Ci„(s,s') = a'^p^{s,s'), where pw{s,s') is the correlation function and a^ is 
the variance, that is, constant over space. Assume r^ is the nugget variance. 
The variance of the corresponding predictive process at location s is given by 
<T?j(s) =a'^p'^{s,S*)p~^{S*,S*)pw{s,S*), where p^(s,5*) is the correlation 
vector between u;(s) and {w(s*),s* G 5*}, and p^(5*,5*) is the correla- 
tion matrix of the realizations of w{s) at the knots in the set S*. From the 
nonnegative definiteness of the residual covariance function, we obtain the 
inequality o"^(s) > o"?^(s). Equality holds when s belongs to S*. Therefore, 
the predictive process produces a lower spatial variability. Banerjee et al. 
(2010) proved that there are systematic upward biases in likelihood-based 
estimates of the spatial variance parameter o"^ and the nugget variance r^ 
using the predictive process model as compared to the parent model. In- 
deed, the simulation results in Finley et al. (2009) and Banerjee et al. (2010) 
showed that both a'^ and r^ are significantly overestimated especially when 
the number of knots is small. Our simulation study to be presented in Sec- 
tion 4 also shows that predictive processes produce biased estimations of 
model parameters in the multivariate spatial case. 
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3.3. Full-scale covariance approximation. Motivated by the fact that dis- 
carding the residual process e{s) = ■w(s) — w(s) is the main cause of the 
problem associated with the multivariate predictive process, we seek to com- 
plement the multivariate predictive process by adding a component that 
approximates the residual cross-covariance function while still maintaining 
computational efficiency. We approximate the residual cross-covariance func- 
tion by 

(7) re(s, s') = [rw(s, s'; e) - rw(s, s'; 6)] o /C(s, s'), 

where o denotes the Schur product (or entrywise product) of matrices, and 
the matrix- valued function /C(s,s'), referred to as the modulating function, 
has the property of being a zero matrix for a large proportion of possible 
spatial location pairs (s,s'). The zeroing property of the modulating func- 
tion implies that the resulting cross-covariance matrix is sparse and, thus, 
sparse matrix algorithms are readily applicable for fast computation. We will 
introduce below modulating functions that have zero value when s and s' 
are spatially farther apart. For such choices of the modulating function, the 
effect of multiplying a modulating function in (7) is expected to be small, 
since the residual process mainly captures the small scale spatial variability. 
Combining (6) with (7), we obtain the following approximation of the 
cross-covariance function: 

(8) rt^(s,s') = rw(s,s') + re(s,s'). 

Note that the first part of the cross-covariance approximation, T^, is the 
result of the predictive process approximation and should capture well the 
large scale spatial dependence, while the second part, r§, should capture 
well the small scale spatial dependence. We refer to (8) as the full-scale 
approximation (FSA) of the original cross-covariance function. 

Using the FSA, the covariance matrix of the data Y from model (1) is 
approximated by 

(9) SY = Sw + S^ + In0 5]^, 

where Sw = Cw(5,5*; 6>)C;r^(6>)CX(5,5*;6>), and ^i = [Ti{si,sj;9)]l^^^. 
The structure of the covariance matrix (9) allows efficient computation of 
the quadratic form of its inverse and its determinant. Using the Sherman- 
Woodbury-Morrison formula, we see that the inverse of Xly can be computed 
by 

(10) X {C + Cw(5, S*f{l^e + 1„ Se)~'Cw(5, S*)r' 

xCw(5,5*)^(I]e + In®5],)-\ 
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The determinant is computed as 

det(I]w + Se + I„®S;e) 

(11) =det{C + Cw(cS,cS*)^(Se+In®S,)-lCw(cS,cS*)} 

x{det(C)}"'det(S^ + I„®I], 



^e) 



Notice that Sg + 1^® 5]^ is a sparse matrix and C^ + Cw(5,5*)-^(S§ + 'i-n® 
5]£:)~^Cw('S, 5*) is an niR x mR matrix. By letting m be much smaller than n 
and letting S^ + I„ (8) S^ have a big proportion of zero entries, the matrix 
inversion and the determinant in (10) and (11) can be efficiently computed. 
These computational devices are combined with the techniques described in 
subsection 3.1 for Bayesian model inference and spatial prediction. 

Now we consider one choice of the modulating function, that is, based 
on a local partition of the domain. Let i?i, . . . , Bk be K disjoint subregions 
which divide the spatial domain D. The modulating function is taken to 
be /C(s,s') = 'S-KxK if s and s' belong to the same subregion, and /C(s,s') = 
^KxK otherwise. Voronoi tessellation is one option to construct the disjoint 
subregions [Green and Sibson (1978)]. This tessellation is defined by a num- 
ber of centers c = (ci , . . . , c^) , such that points within Bi are closer to Cj than 
any other center Cj, j ^ i, that is, Bi = {s : ||s — Cj|| < ||s — Cj|[, j ^ i}. Our 
choice of a Voronoi partitioning scheme is made on the ground of tractability 
and computational simplicity. We would like to point out that our method- 
ology is not restricted to this choice and any appropriate partitioning strat- 
egy for the spatial domain could be adopted. Since the modulating function 
so specified will generate an approximated covariance matrix with block- 
diagonal structure, this version of the FSA method is referred to as the 
FSA-Block. 

The FSA-Block method provides an exact error correction for the predic- 
tive process within each subregion, that is, rw(s,s') = rw(s,s') if s and s' 
belong to the same subregion, and rw(s,s') = rw(s,s') if s and s' belong 
to different subregions. Unlike the independent blocks analysis of spatial 
Gaussian process models, the FSA can take into account large/global scale 
dependence across different subregions due to the inclusion of the predictive 
process component. Unlike the predictive process, the FSA can take into 
account the small scale dependence due to the inclusion of the residual pro- 
cess component. An interesting special case of the FSA-Block is obtained by 
taking n disjoint subregions, each of which contains only one observation. 
In this case, £{s) is reduced to an independent Gaussian process, that is, 
re(s,s) =rw(s,s) — rw(s,s), and re(s,s') = for s/s'. In fact, this special 
case corresponds to the modified predictive process by Finley et al. (2009), 
that is, introduced to correct the bias for the variance of the predictive 
process models at each location. Clearly, our FSA-Block is a more general 
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approach since it also corrects the bias for the cross-covariance between two 
locations that are located in the same subregion. 

For the FSA-Block, the computation in (10) and (11) can be further 
simplified. Assuming the set of observed locations is grouped by subregions, 
that is, 5 = {Sbi , • • • , Sbj^}, the nR x nR matrix S^ becomes a K x K block 
diagonal matrix with the kth. diagonal block 

Cw('5bj.,5bj.) — Cw(5Bfc,5*)C^~ (0)C^(5bj.,5*), k = l,. . . ,K, 

and, thus, S^ + 1„ (8> Xl^ is a K x K block diagonal matrix. The inversion 
and determinant of this block diagonal matrix can be directly computed 
efficiently without resorting to general-purpose sparse matrix algorithms if 
the size of each block is not large. 

The computational complexity of the FSA-Block depends on the knot in- 
tensity and the block size. If we take equal-sized blocks, then the computa- 
tional complexity of the log likelihood calculation is of the order 0{nm?R? + 
nh'^R^), where m is the number of knots and h is the block size. This is much 
smaller than the original complexity of 0{n^R?) without using covariance 
approximation. Moreover, the computational complexity of the FSA can be 
further reduced using parallel computation by taking advantage of the block 
diagonal structure of S^. 

An alternative choice of the modulating function in (7) is to use a positive 
definite function, that is, identically zero whenever ||s — s'|| > 7. Such a func- 
tion is usually called a taper function and 7 is called the taper range. The 
resulting FSA method is referred to as the FSA- Taper. In the univariate case, 
any compactly supported correlation function can serve as a taper function, 
including the spherical and a family of Wendland functions [see, e.g., Wend- 
land (1995); Wendland (1998); Gneiting (2002)]. Sang and Huang (2010) 
studied the FSA- Taper and demonstrated its usage for univariate spatial 
processes. For the multivariate processes considered in this paper, the mod- 
ulating function /C(s, s') can be chosen as any valid multivariate taper func- 
tion. One such choice is the matrix direct sum of univariate taper functions, 
that is, /C(s,s') = 0r=i-fi'r(s,s';7r), where K^ is a valid univariate taper 
function with the taper range 7,- used for the rth spatial variable, and differ- 
ent taper ranges can be used for different variables. This cross-independent 
taper function will work well with the FSA if the cross dependence between 
co-located variables can be mostly characterized by the reduced rank pro- 
cess w. Using this taper function, the cross-covariance matrix of the resid- 
ual process, Sl^, can be transformed by row and column permutations to 
a block-diagonal matrix with R diagonal blocks, whose rth diagonal block 
is an n X n sparse matrix with the (i,j)-entry being Cov(er(si),er(sj)) = 
{Cavlwrisi) ,Wr{sj)) — Cov{wr{si),Wr{sj))}Kr{si,Sj;^r), where Wr{s) is the 
reduced rank predictive process for the rth spatial variable and er{s) is 
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the residual process for the rth spatial variable. If we take the same taper 
range for each spatial variable, the computational complexity of the log like- 
lihood calculation is of the order 0{nrn?'B? + ng'^R), where g is the average 
number of nonzero entries per row in the n x n residual covariance matrix 
[Cov(ej.(sj),e,.(sj))]"- j^ for the rth spatial variable. This is a substantial 
reduction from the original computational complexity of 0(n^i?^) without 
using the covariance approximation. 

Use of the FSA involves the selection of knots and the local partitioning 
or tapering strategy. Given the number of knots, we follow the suggestions 
by Banerjee et al. (2010) to use the centers obtained from the K-means 
clustering as the knots [e.g., Kaufman and Rousseeuw (1990)]. A careful 
treatment of the choice of knot intensity m and the number of partitions K 
or the taper range 7 will offer good approximation to the original covariance 
function. Apparently, a denser knot intensity and larger block size or larger 
taper range will lead to better approximation, at higher computational cost. 
In principle, we will have to implement the analysis over different choices 
of m and K or 7 to weigh the trade-off between inference accuracy and 
computational cost. We have used the Euclidean distance and taken m to 
be 225, K to be 36, and 7 to be 10 for the spherical taper function in our 
simulation study, and used the chordal distance and taken m to be 200 
and K to be 10 in the real application and have found that such choices 
work well. A full discussion of this issue will be undertaken in future work. 

4. Simulation results. In this section we report results from a simulation 
study to illustrate the performance of the FSA approach and compare it 
with the predictive process approach and the independent blocks analysis. 
The computer implementation of all the approaches used in this simulation 
study and the analysis of multiple climate models in the following section 
were written in Matlab and run on a processor with dual 2.8 GHz Xeon 
CPUs and 12 GB memory. 

In this simulation study we generated n = 2,000 spatial locations at ran- 
dom from the square [0, 100] x [0, 100] . The data generating model is a bi- 
variate LMC model with a constant 2x2 lower triangular transformation 
matrix A, 

(12) Y(s) = AU(s) + e(s), 

where £(s) ~ MVN(0,r^l2), and U(s) = [Uq{s)]q=i^2 is a 2 x 1 process with 
two independent components, each of which is a univariate spatial process 
with mean zero, unit variance and exponential correlation function. The 
range parameters for Ui{s) and U2{s) are (pi = 10 (i.e., such that the spatial 
correlation is about 0.05 at 30 distance units) and (p2 = 20, respectively. 
The diagonal elements of A are an = 1 and 022 = 0.5, and the nonzero 
off-diagonal element of A is 021 = 0.5. We set the nugget variance r^ = 0.01. 
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Table 2 

The mean and the standard deviations (in parentheses) of the model parameters for the 

full covariance model, the predictive process approximation, the independent blocks 

approximation, the FSA-Block and the FSA-Taper 

Model 01 02 0,11 ai2 a22 t 

True 10 20 1 0.5 0.5 l.OOc-2 

Full model 10.47 (1.02) 22.07 (4.69) 0.99 (0.05) 0.51 (0.03) 0.52 (0.05) l.Ole-2 (3.90o-4) 

Predictive process 13.32 (2.32) 22.71 (7.35) 1.22 (0.08) 0.66 (0.05) 0.49 (0.06) 0.15 (3.46e-3) 

m = 225 
Independent blocks 4.47(0.99) 4.94(1.16) 3.56(0.39) 1.44(0.28) 1.96(0.14) 8.97e-3 (1.18o-3) 

fe = 36 
FSA-Block 11.36 (2.21) 21.17 (5.24) 1.02 (0.09) 0.53 (0.05) 0.49 (0.05) l.lOe-2 (9.00c-4) 

m = 225, fe = 36 
FSA-Taper 14.89 (1.90) 29.92 (7.41) 1.05 (0.07) 0.58 (0.04) 0.50 (0.05) 8.36e-3 (1.07o-3) 

m = 225,7= 10 



Given these data, we used the Bayesian MCMC approach to generate 
samples from the posterior distributions of the model parameters. We as- 
signed Unif(l, (imax/3) priors to the range parameters (^i and (/>2, where dmax 
is the maximum distance of all pairs. The diagonal elements of A and r^ 
were assumed to have the inverse gamma distribution with the shape param- 
eter 2 and the scale parameter 1, IG(2, 1), as priors. We assigned a normal 
prior with large variance for 021- 

We compared the model fitting results from four covariance approxima- 
tion methods: the predictive process, the independent blocks approximation, 
the FSA-Block and the FSA-Taper. As a benchmark for our comparison, we 
also fit the original full covariance model without using a covariance approx- 
imation. For the predictive process approximation, we used m = 225 knots. 
For the independent blocks approximation, we used K = 2>Q blocks. For the 
FSA-Block, we used m = 225 knots and K = 2>Q blocks. For the FSA-Taper, 
we used m = 225 and a spherical tapering function with taper range 7 = 10. 
The number of blocks for the FSA-Block and the taper range for the FSA- 
Taper are selected such that these two FSA methods lead to comparable 
approximations for the small scale residual covariance. For the above meth- 
ods, the knots were chosen as the centers from the X-means clustering, and 
the blocks were taken as equal-sized squares that form a partition of the 
[0, 100] X [0, 100] region. 

Table 2 displays the Bayesian posterior means and the corresponding pos- 
terior standard deviations for the model parameters under each approach. 
We observe that the diagonal values of A, the range parameter of C/i, and 
the nugget variance r^ are all overestimated by the predictive process. We 
also notice large biases of the parameter estimates using independent blocks 
approximation. The FSA-Block provides the most accurate parameter esti- 
mation among all the methods. 



18 H. SANG, M. JUN AND J. Z. HUANG 

Table 3 

Die scores and MSPEs for the full covariance model, the predictive process 

approximation, the independent blocks approximation, the FSA-Block and the FSA-Taper. 

MSPE-Random is based on a test data set of 200 locations that are randomly selected 

from [0,100] x [0,100]. MSPE-Hole is based on a test data set of size 200 that consists of 

160 randomly selected locations from [0, 100] x [0, 100] and 40 random locations within 

two circles: {ix,y);x^ + {y - 90)^ < 30} and {(x,y);{x - 50)^ + {y - 50f < 35} 
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To gauge the performance on model fitting for different approaches, we 
used the deviance information criterion [DIG, Spiegelhalter et al. (2002)], 
which is easily calculated from the posterior samples. From Table 3, we ob- 
serve that the benchmark full covariance model has the smallest DIG score, 
indicating the best model fitting. The FSA-Block approach gives a slightly 
larger DIG score than the full model, while the predictive process and the 
independent blocks approximation yield significantly larger DIG scores than 
the FSA and the full model. This result shows that the FSA performs much 
better than the predictive process and independent blocks approximation in 
terms of model fitting. 

To compare the methods in terms of prediction performance, we com- 
puted the mean square prediction errors (MSPE) based on a simulated test 
data set of 200 locations using the previously simulated data set with ob- 
servations at 2000 locations as the training set. We experimented with two 
different kinds of test sets: one set consists of 200 locations randomly se- 
lected from [0, 100] x [0, 100] and another consists of 160 randomly selected 
locations from [0, 100] x [0, 100] and 40 random locations within two circles: 
{{x, y);x'^ + {y- 90)^ < 30} and {(x, y);{x- 50)^ + {y - 50)^ < 35}. The sec- 
ond interpolation scenario is common in practice where missing data often 
correspond to sizable gaps/holes. 

From Table 2, we see that both the FSA-Block and the FSA-Taper meth- 
ods lead to much more accurate predictions than the predictive process and 
the independent blocks approximation. In the scenario of predicting for miss- 
ing gaps/holes, the advantage of using the FSA approach over the other two 
approximation approaches is more significant. 

To compare the computation efficiency of the covariance approximations 
for larger data sets, we repeated the simulation study when the number 
of spatial locations in the training set is increased to 5,000 and the num- 
ber of locations in the test set to 1,000. We measured the MSPE and the 
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Fig. 1. The MSPE versus time plot for the simulation study in Section 4 under the 
predictive process (circle), the FSA-Block (plus) and the FSA-Taper (star). 



associated computational time based on the prediction at the 1,000 test 
locations. The test set consists of 800 randomly sampled locations from 
[0, 100] X [0, 100] and 200 locations randomly selected within two circles: 
{{x,y);x'^ + {y- 90)2 ^ 3q| ^.^^ {{x,y);{x - 50)^ + {y - 50)^ < 35}. The 
MSPE for each model was obtained by plugging in the true parameter val- 
ues into the BLUP equation. Pairs of the MSPE and the computational 
time were obtained by varying the knot intensity for the predictive process 
model, block size for the independent blocks approximation, knot intensity 
and block size for the FSA-Block approach, and knot intensity and taper 
range for the FSA-Taper approach. Figure 1 shows that both the FSA-Taper 
and the FSA-Block methods outperform the predictive process in terms 
of their computational efficiency for making predictions. The independent 
blocks approximation yields much larger MSPE than the other three ap- 
proaches given the same computational time and we decided not to include 
its results in Figure 1. It is also noticeable that the required computational 
time of the FSA-Block approach to obtain the same MSPE is much less than 
that of the FSA-Taper approach. For this reason, we used the FSA-Block 
approach to analyze the climate errors data in Section 5. 

We acknowledge that performance for the approximation approaches may 
depend on the characteristics of spatial data, such as sample size, pattern 
of sampling locations, spatial smoothness and spatial correlation range. To 
investigate the effect of spatial correlation range on the performance of the 
FSA approach, we conducted two other experiments: one with small range 
parameters 0i = 5 and 02 = 10, and the other with large range parameters 
4>i = 30 and (/>2 = 60. In the case of small spatial ranges, the independent 
blocks approximation performed better than the predictive process, while 
the FSA performed similarly to the independent blocks analysis. In the case 



20 H. SANG, M. JUN AND J. Z. HUANG 

Eirat eS Ciimate Mtkdal 1 Rftgrsssion Mean of Modal 1 Sblt Regression Mean) of Modal 1 




EttQf ol Ctimate Model 2 Regression Mean ot Model 2 Std( Regression Mean) ol Model 2 




Error of Ctimate Model 3 



Regression Mean of Model 3 Std( Regression Mean) of Model 3 




Error of CllmEite Model 4 Regression Mean ol Model 4 StdfRegresston Mean) of Model 4 




Error of Climate Model 5 Regression Mean of Model 5 Stdt Regression Mean) of Model 5 




D fl 4 • t 10 -It 



Fig. 2. The first column shows the surface maps of climate model errors. The second 
and the third columns show the estimated mean structure of each climate model error and 
the associated standard deviations. 



of large spatial ranges, the predictive process performed similarly to the 
FSA and both methods performed significantly better than the independent 
blocks approximation. These two experiments indicate that both the predic- 
tive process and the independent blocks approximation have their modes of 
successes and failures. Under their failure modes, to achieve accurate model 
inference and prediction, one has to use either a significantly large rank 
number m for the predictive process or a very small number of blocks K for 
the independent blocks approach, therefore, the computational advantages 
associated with a small m and a large K would disappear. In contrast, by 
the combining of a small m and a large K, the FSA-Block flexibly accom- 
modates data sets with either large scale or small scale spatial variations 
while still maintaining the computational efficiency. 



COVARIANCE APPROXIMATION 21 

5. Application result. To build a joint model that accounts for the cross- 
covariance structure of the multiple climate model errors, we fit and com- 
pared two versions of the LMC models, corresponding to spatially fixed and 
spatially varying transformation matrices, as described in Section 2.2. The 
LMC model specification depends on the ordering of the response variables 
because of the lower triangular specification of the transformation matrix. 
When applying the LMC model to the five climate model errors, we tried 
a few orderings of climate models and found that different orderings may 
produce different values of parameters but they produced fairly consistent 
estimates of variances and cross-correlations. Therefore, we chose to present 
the results based on the order given in Table 1. 

We applied the FSA-Block approach to facilitate the computation. We 
used 225 knots selected by the ii'-means clustering algorithm and divided 
the study region into 10 regions for subsequent analysis. We assumed the 
exponential spatial correlation function for each Uq{s), and assigned a uni- 
form prior [7(50,4,500) for each of the spatial range parameters given that 
the maximum chordal distance between any two locations is 12,757 km. For 
each of the coefficients /3 in the regression in model (1), we assumed inde- 
pendent normal priors with mean and variance 1,000. For the LMC model 
with a constant A, we assumed the diagonal entries to have truncated nor- 
mal distributions ranging from to oo and diagonal entries to have normal 
distributions, with their means being the empirical estimates of A and vari- 
ances 1,000. For the spatially varying LMC model with A(s), we assumed 
that the intercepts of the coefficients r/j • in the regression model for A(s) 
have normal distributions, with their means being the empirical estimates 
of A and variances being 1,000. We set normal priors with mean and 
variance 1,000 to the other coefficients of rj.^,:. For each model, we ran 3,000 
iterations of MCMC to collect posterior samples after a burn-in period of 
1,000 iterations, thinning every third iteration. 

The Die scores of two specifications for A were compared, one with a con- 
stant A and the other with spatially varying A depending on polynomials 
of latitude, land/ocean effect and altitude, as detailed in Section 2.2. The 
Die score for the model with a spatially varying A(s) is 9,901, which is 
much smaller than the score of 11,975 for the model with a constant A. 
This suggests that the spatially varying LMC model performs significantly 
better than the model with a constant A, as we expected. From now on, we 
present results based solely on the spatially varying LMC model. 

Figure 2 shows the estimated mean structure and its standard deviations 
obtained from the posterior samples. It is interesting to note a clear distinc- 
tion between model 5 and the rest of the models; for model 5, the estimated 
means are negative with large magnitudes over high altitude areas, whereas 
the rest gives large positive values for the estimated mean structure over high 
latitude areas. This finding is consistent with the result in Jun, Knutti and 
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Nychka (2008b) [note that the model numbers in this paper and the model 
numbers in Jun, Knutti and Nychka (2008b) are different, although all the 
models used in this paper are also used in Jun, Knutti and Nychka (2008b)] . 
Recall that the model error is calculated by subtracting the model output 
from the observation. The above result suggests that models 1-4 may under- 
estimate the mean state of the surface temperature over the high-altitude 
and high-latitude regions, and model 5 may overestimate the mean state 
over the high-altitude area. The estimated mean structure and its associ- 
ated standard deviations are quite similar for the models developed by the 
same groups — models 1 and 2 from the GFDL group of NOAA, and mod- 
els 3 and 4 from the Hadley Centre in the UK. The spatial patterns of the 
associated standard deviations for those pairs are also quite similar. For all 
the models, altitude is responsible for the dominant effects in the fixed part 
of the process and we also see some effects of latitude, although longitude 
does not seem to be significant in the mean structure. Models 3-5 seem to 
have large errors in the high-latitude and sea-ice area. The indicator for the 
land and the ocean is not significant for the mean structure. This may be 
due to the fact that we already include altitude as a covariate (altitude is 
positive over the land and zero over the ocean). 

Given the posterior samples of the model parameters, we draw sam- 
ples of the cross-covariance matrix at each location s based on rw(s,s) = 
A(s)A-^(s). The diagonals of rw(s, s) are the variances of the climate model 
errors at location s. Figure 3 shows the maps of standard deviations for each 
model error and the standard deviations of the standard deviations obtained 
from the posterior samples. The patterns throughout all 5 models are quite 
consistent; high standard deviations at high altitudes, and standard devi- 
ations are higher over the land than over the ocean. For models 1 and 3, 
latitude seems to be a significant factor. For all the models, given the values 
of the posterior sample standard deviations of the standard deviations, the 
spatial patterns that we observe (such as high variances over high altitude 
or high latitude area) are statistically significant and are not due to random 
variations in the data. The sea-ice region is one of the places that models in 
general have trouble. Although we do not have a factor for sea-ice region in 
the model, we see high standard deviations around sea-ice area. This may 
be due to the interaction between latitude and the indicator for the land 
and the ocean. As we expected, model 5 has significantly larger values of 
standard deviations, especially in high-altitude areas, compared to the rest 
of the models. The standard deviations of the error of model 5 over the Hi- 
malayan area are more than 10. Their associated posterior sample standard 
deviations are also quite large. Overall, we observe that standard deviations 
of model errors are slightly higher over the land than over the ocean. 

Now we discuss the estimates of cross-correlations of different climate 
model errors. Figure 4 gives the spatial maps of cross-correlations between 
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Fig. 3. Surface maps of the estimated standard deviations of each climate model (left 
column) and the associated standard deviations (right column). 



each pair of climate model errors, obtained from the posterior samples. The 
corresponding standard deviation of the estimated cross-correlations using 
the posterior samples are given in Figm'e 5. In both figures, we use the same 
color scale across pairs of models to make the comparison easier. First, 
notice that overall the correlation between models 1 and 2, two models 
developed by the GFDL group of NOAA, is the highest. For models 1 and 2, 
overall the correlation values are strikingly high and the associated standard 
deviation values are close to zero. The maximum correlation value over the 
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Fig. 4. Surface maps of the estimated cross-correlation for each pair of climate model 
errors. 



entire domain is 0.769, the average of correlations over the ocean is 0.732 
with the standard deviation 0.017, and the average over the land is 0.639 
with the standard deviation 0.021. Jun, Knutti and Nychka (2008b) also 
report the highest level of cross-correlation for this pair of models. The 
cross-correlations between models 3 and 4 (two models developed by the 
Hadley Centre in the UK) are not as high as those between models 1 and 2 
and they are comparable to the cross-correlations between any one of the 
models 1 and 2 and any one of the models 3 and 4. In addition, the cross- 
correlation structure between models 1 and 2 shows different spatial patterns 



COVARIANCE APPROXIMATION 



25 



Std(Corr) of Model 1&2 



Std(Co[T) of Model 3M 




Std(CorT) of Model 1&3 




Std(Corr; or Model 2&3 




Std(CoiT) of Model 1&5 




Std(Corr) of Model 3&S 





Std(Corr) of Model 1&4 



&td(Corr) of Model 2fi4 




■■■^. \^ 



Std(Corr) of Model 2Si5 



'^./' 




Std(Corr) of Model 4&5 




Fig. 5. Standard deviation of the estimated cross-correlation for each pair of climate 
model errors in Figure 4- 



compared with that between models 3 and 4. Models 1 and 2 have quite small 
correlation over high-altitude area, meaning that the two models disagree 
over high-altitude areas, whereas models 3 and 4 have consistently large 
correlations over the entire domain. For models 3 and 4, the average of 
correlations over the land and the ocean are both slightly larger than 0.4. 
Cross-correlations between model 5 and the rest of the models are quite small 
in magnitude and the patterns are consistent for all the 4 pairs. Correlations 
between model 5 and the rest of the models are quite small over the land, 
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over high latitudes in the Northern Hemisphere and over low latitude area 
in the Southern Hemisphere. From these maps, it is clear that model 5 is 
significantly different from the rest of the models and this agrees with the 
result in Jun, Knutti and Nychka (2008b). In many pairs of models, we see 
clear effects of the indicator for the land and the ocean, and the latitude. 

In addition to the cross-correlations at the same location, our method 
allows us to examine both marginal and cross-covariances/correlations be- 
tween two different locations, based on the posterior samples. Given that the 
cross-correlations at the same location are quite different over the land and 
the ocean, we examined the spatial correlations against spatial distance over 
the land and the ocean separately. Figure 6 gives both marginal and cross- 
correlations against chordal distance (unit: km) for pairs of models based 
on the 1,656 observational grid points. Since we adopted exponential spa- 
tial correlation functions for the latent spatial processes f/q(s)'s in the LMC 
model with the lower triangular structure for A(s), the fitted correlation 
function for model 1 is isotropic. We also note that marginally the model 
errors 2, 3 and 4 give similar spatial correlation structures over the land and 
the ocean, while model 5 shows mild discrepancy between the correlations 
over the land and the ocean. Moreover, the spatial correlation structure of 
model 5 exhibits a slower spatial decay compared with the rest of the climate 
model errors. Models developed by the same group (i.e., pairs 1 and 2, and 3 
and 4) give similar spatial cross-correlation structure for both over the land 
and over the ocean. The cross-correlations between model 5 and the rest of 
the models are close to zero over the land, while those over the ocean are 
significantly different from zero. 

As in Jun, Knutti and Nychka (2008b) and Sain, Furrer and Cressie 
(2011), we can perform the analysis for the summer season average of North- 
ern Hemisphere (JJA, June-July-August average) in the same way that we 
did for the DJF averages. We do not include those results for brevity of the 
paper. 

We have also implemented the spatially varying LMC using the predictive 
process with the same knots as the FSA {m = 200) for comparison purposes. 
The Die score of the predictive process model is 35,526, which is much larger 
than the DIG score of 9,901 of the FSA model, indicating that the FSA 
model has a better model fitting for the data. We did not observe significant 
difference in the estimations for the spatial range parameters. We present 
in Figure 7 the correlation maps between models 1 and 3 to illustrate the 
difference between the results of the two approaches. Although, for this real 
data analysis, the true correlations between models 1 and 3 are unknown, the 
results obtained from the FSA approach seem to be more reasonable than 
those from the predictive process. For instance, the map using the FSA 
approach clearly shows that the correlations over the ocean are in general 
higher than the correlations over the land, as expected based on previous 
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Fig. 6. Spatial correlations for pairs of model errors against chordal distance (unit: km). 
Due to the nonstationarity of the correlation structure, we display the correlation in the 
following way. For each pair of model errors, we first obtain a 1,656 x 1,656 cross (or 
marginal)-correlation matrix given from the posterior mean of the correlations. Then we 
calculate the averages, the 10th and the 90th percentiles of the correlations within each bin, 
with 30 equally spaced bins from distance to 5, 000 km, separately over the land and the 
ocean. Solid lines connect the binned averages and dashed lines connect the 10th and the 
90th percentiles. 
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Fig. 7. Correlation maps and the associated standard deviation maps between models 1 
and 3. The upper panel shows the maps using the FSA-Block approach and the bottom 
panel shows the maps using the predictive process. 



studies. The map using tlie predictive process fails to show this pattern. The 
poor performance of the predictive process in this example might be due to 
the inadequate number of knots and hence can be remedied by increasing 
the knots intensity. However, increasing the number of knots will greatly 
increase the computational time of the predictive process. 



6. Discussion. We built a joint spatial model for multivariate climate 
model errors accounting for their cross-covariances through a spatially vary- 
ing LMC model. To facilitate Bayesian computation for large spatial data 
sets, we developed a covariance approximation method for multivariate spa- 
tial processes. This full-scale approximation can capture both the large scale 
and small scale spatial dependence and correct the bias problem of the pre- 
dictive process. 

Our empirical results confirmed that pairs of climate models developed 
by the same group have high correlations and climate models in general 
have correlated errors. We also showed that some climate models are very 
different from other climate models and, thus, the cross-correlations between 
them are quite small. 

In principle, we could combine multiple climate model outputs as a weigh- 
ted linear combination, along the same lines as Sain and Furrer (2010), based 
on our modeling approach rather than the Bayesian Gaussian MRF model 
described in Section 4 of Sain and Furrer (2010). However, recently there 
have been several papers that raise concerns about the practice of com- 
bining multiple climate model outputs through model weighting or ranking 
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[see, e.g., Knutti et al. (2010b); Knutti (2010); Weigel et al. (2010)]. The 
main concern is the difficulty of interpreting the weighted average of chmate 
models physically. Knutti et al. (2010a) suggest that if model ranking or 
weighting is applied, both the quality metric and the statistical framework 
used to construct the ranking or weighting should be recognized. To deter- 
mine what is the best quality metric in weighting or ranking the climate 
models is a challenging problem. 

We focus on the climate model errors in this paper. It would be interesting 
to build more elaborate joint models of multiple climate model outputs 
as well as observations at their original spatial grid resolutions. To follow 
this path, we need to have a statistical representation of the true climate, 
which is very challenging. One possibility is to model the observation as the 
truth, with measurement errors assumed with a simple covariance structure. 
However, this assumption might not be realistic considering the relatively 
complex nature of biases and errors in the climate observations. 

In this paper we addressed only the spatial aspect of the problem and we 
applied our methodology to the mean state of the climate variable. It would 
be interesting to extend our approach to spatio-temporal problems and, in 
particular, to consider the climate model outputs in their original time scale 
of monthly averages or annual averages for studying the trend of the climate 
model errors. 
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